Frugal Streaming for Estimating Quantiles: One (or two) memory suffices
نویسندگان
چکیده
Modern applications require processing streams of data for estimating statistical quantities such as quantiles with small amount of memory. In many such applications, in fact, one needs to compute such statistical quantities for each of a large number of groups, which additionally restricts the amount of memory available for the stream for any particular group. We address this challenge and introduce frugal streaming, that is algorithms that work with tiny – typically, sub-streaming – amount of memory per group. We design a frugal algorithm that uses only one unit of memory per group to compute a quantile for each group. For stochastic streams where data items are drawn from a distribution independently, we analyze and show that the algorithm finds an approximation to the quantile rapidly and remains stably close to it. We also propose an extension of this algorithm that uses two units of memory per group. We show with extensive experiments with real world data from HTTP trace and Twitter that our frugal algorithms are comparable to existing streaming algorithms for estimating any quantile, but these existing algorithms use far more space per group and are unrealistic in frugal applications; further, the two memory frugal algorithm converges significantly faster than the one memory algorithm.
منابع مشابه
Frugal Streaming for Estimating Quantiles
Modern applications require processing streams of data for estimating statistical quantities such as quantiles with small amount of memory. In many such applications, in fact, one needs to compute such statistical quantities for each of a large number of groups (e.g.,network traffic grouped by source IP address), which additionally restricts the amount of memory available for the stream for any...
متن کاملEstimating Quantiles from the Union of Historical and Streaming Data
Modern enterprises generate huge amounts of streaming data, for example, micro-blog feeds, financial data, network monitoring and industrial application monitoring. While Data Stream Management Systems have proven successful in providing support for real-time alerting, many applications, such as network monitoring for intrusion detection and real-time bidding, require complex analytics over his...
متن کاملEstimating Aggregate Properties on Probabilistic Streams
The probabilistic-stream model was introduced by Jayram et al. [16]. It is a generalization of the data stream model that is suited to handling \probabilistic" data where each item of the stream represents a probability distribution over a set of possible events. Therefore, a probabilistic stream determines a distribution over potentially a very large number of classical \deterministic" streams...
متن کاملFast Algorithm for Computing Weighted Projection Quantiles, Quantile Regression and Data Depth for High-Dimensional Large Data Clouds
In this paper we present a new algorithm based on a weighted projection quantiles for fast and frugal real time quantile estimation of large sized high dimensional data clouds. We present a projection quantile regression algorithm for high dimensional data. Second, we present a fast algorithm for computing the depth of a point or a new observation in relation to any high-dimensional data cloud,...
متن کاملA One-Pass Algorithm for Accurately Estimating Quantiles for Disk-Resident Data
The cpquantile of an ordered sequence of data values is the element with rank ‘pn, where n is the total number of values. Accurate estimates of quantiles are required for the solution of many practical problems. In this paper, we present a new algorithm for estimating the quantile values for disk-resident data. Our algorithm has the following characteristics: (1) It requires only one pass over ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1407.1121 شماره
صفحات -
تاریخ انتشار 2012